Psychometric properties of the Chinese version of the PROMIS-Cancer-Anxiety item bank assessed using a graded response model

Objective This study aimed to examine the psychometric properties of the Chinese version of the Patient-Reported Outcome Measurement Information System (PROMIS)-Cancer-Anxiety item bank using a graded response model in a sample of patients with cancer. Methods A cross-sectional study was conducted and the Chinese version of the PROMIS-Cancer-Anxiety item bank was used to measure anxiety in patients with cancer. The unidimensional structure of the item bank was evaluated using principal component analysis. Residual correlations and the graphs of item mean scores conditional on the rest scores were examined to evaluate the local independence and monotonicity of the items, respectively. Item characteristics were described using item parameter estimates and item information. Operating characteristic curves (OCCs) and test information curve (TIC) were also plotted. Measurement invariance across age, gender, and education level was assessed to identify possible differential item functioning (DIF). Results A total of 1075 patients with cancer were enrolled. Under the assumptions of unidimensionality, local independence, and monotonicity, the discrimination parameters a ranged from 2.30 to 5.47, and the threshold parameters b ranged from b1 = −2.87 to b4 = 3.21 with proper intervals. Completely overlapped category curves were not observed among the OCCs of any items. Item information and TIC showed that the item bank had a wide measurement range. The DIFs for age, gender, and education level for all items were not remarkable. Conclusions The results supported using the Chinese version of the PROMIS-Cancer-Anxiety item bank to measure anxiety and develop a computerized adaptive testing (CAT) system for anxiety in patients with cancer.


Introduction
Cancer is the leading cause of morbidity and mortality worldwide.3][4] As a multifactorial symptom, cancer-related anxiety decreases patient quality of life and influences treatment adherence. 5,6Evidence has also shown that clinically diagnosed anxiety disorders are always associated with increased cancer incidence, higher cancer-specific mortality, and poorer cancer survival. 7Accurate anxiety assessments are therefore necessary for patients with cancer.
The early assessment of anxiety has clinical and public health benefits for cancer prevention and treatment. 7It is recommended that all health care providers routinely assess the presence of emotional distress and specific anxiety symptoms from the point of diagnosis onward. 8The use of a valid and reliable tool for anxiety assessment with clinically meaningful, reportable scores is warranted; however, the abundance and heterogeneity of anxiety measures make it difficult to compare individual scales. 9Variations in item content and context, including timeframe, response scales, and the number of response categories, further complicate this task. 9he Patient-Reported Outcome Measurement Information System (PROMIS) solves these issues and is easily accessible with flexible administration and standardized scoring (www.nihpromis.org).The PROMIS-Cancer-Anxiety item bank (PROMIS-A) is a cancer-specific version of the PROMIS Chronic Illness banks that has undergone six rigorous development processes: domain identification and definition, qualitative item review, refinement of items for cancer populations, field testing, psychometric data analysis, and evaluation of item banks. 10To complete an efficient assessment of anxiety among patients with cancer using the PROMIS instruments, short forms and computerized adaptive testing (CAT) are the main administration options 11 ; of the two, the CAT system has higher measurement efficiency and accuracy.To achieve this, using an item response theory (IRT)-calibrated item bank is key. 12In the measurement process, the CAT system chooses items from the bank that best match the person's level of latent trait (θ); it skips unnecessary items, enabling personalized testing for each patient and a more precise representation of the patient's trait. 12,13Because of this, constructing a high-quality item bank is crucial for effectively implementing and strengthening CAT systems.
IRT models the probability of selecting a particular item response category as a function of the person's level of θ and item characteristics and examines the performance of individual items of the scale and the appropriateness of response categories. 11,14The graded response model (GRM) proposed by Samejima is a commonly applied unidimensional IRT model applicable to Likert scale items. 15The GRM also has a wide range of applications with the PROMIS, as it provides guidance for forming PROMIS short forms and has profound implications for developing PROMIS CAT. 12,16,17For example, the Italian custom four-item short form selected from the PROMIS Anxiety Form 8a was formed based on analysis of the GRM, 17 and the Dutch-Flemish version of the PROMIS Anxiety CAT was developed within the analytical framework of the GRM. 12 Currently, the PROMIS-A has not been evaluated in Chinese patients with cancer, and a corresponding CAT system has not been developed.Thus, this study aimed to assess the psychometric properties of the PROMIS-A among patients with cancer under the guidance of the GRM.

Samples
Using convenience sampling, a cross-sectional study was conducted at Fudan University Shanghai Cancer Center and Fudan University Affiliated Zhongshan Hospital in Shanghai, Mainland China, from November 2020 to July 2021.Eligible patients clinically diagnosed with cancer, aged !18 years, knew their disease diagnosis and could speak and write in Mandarin were enrolled.Patients with cognitive deficits or critical medical conditions were excluded.

Measures
The general information questionnaire Demographic and clinical characteristics, including age, gender, education level, marital status, religious beliefs, long-term residence, employment status, cancer type, cancer metastasis, antineoplastic therapy, chronic diseases, and complications, were obtained with the general information questionnaire.The demographic characteristics were selfreported by the patients and the clinical variables were checked with the electronic medical records by trained nurse investigators.

The PROMIS-A
PROMIS-A is a cancer-specific version of the PROMIS anxiety item bank. 10Consisting of 23 items, the PROMIS-A assesses a wide range of anxiety-related feelings and symptoms within a 7-day recall period. 18espondents were asked to rate how often they experienced these unpleasant feelings or symptoms on a five-point Likert scale ("Never" ¼ 1, "Rarely" ¼ 2, "Sometimes" ¼ 3, "Often" ¼ 4, "Always" ¼ 5); a higher score indicated a higher level of anxiety.Authorization to translate the PROMIS-A into Chinese was obtained from the PROMIS Health Organization (PHO), and translation and cultural adaptation of the PROMIS-A was completed according to the Functional Assessment of Chronic Illness Therapy (FACIT) translation methodology. 19Related cognitive debriefing interviews for the PROMIS-A were conducted for patients with cancer in China to ensure the cultural equivalence of the instrument. 20

Data analysis
Descriptive statistics, including the demographic and clinical characteristics of the sample, as well as the distribution of PROMIS-A, were obtained using IBM SPSS version 26.0.Cronbach's α was considered an adequate internal consistency measure, with a criterion higher than 0.70 indicating good internal consistency. 21The homogeneity of the PROMIS-A was testified if Cronbach's α had no significant increase when the item was removed and when the correlation coefficients of the item-scale were higher than 0.40. 22hree main assumptions of GRM, including unidimensionality, local independence, and monotonicity, were evaluated in this study.Except for the unidimensionality, the other two assumptions were implemented using R version 4.2.2.The principal component analysis was used to examine the unidimensional structure of the instrument, with the premise that the statistic of the Kaiser-Meyer-Olkin test was close to 1 and the P-value of Bartlett's test of sphericity was less than 0.05; the assumption of unidimensionality was acceptable if the ratio of the first component characteristic root to the second component characteristic root exceeded three. 23,24The local independence of the items was verified with residual correlations lower than 0.70, and the monotonicity was supported with the graphs of item mean scores conditional on the rest scores (ie, total raw score minus the item scores). 25,26ith the guidance of GRM, the psychometric evaluation of the PROMIS-A, including model fit, item parameter estimates, operating characteristic curves (OCCs), item information, test information curve (TIC), and differential item functioning (DIF), was conducted using R.The model fit was assessed by Root Means Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), and Tucker-Lewis Index (TLI). 27MSEA lower than 0.06 and CFI and TLI higher than 0.95 were used as the criterion for a good fit. 28The item parameter estimates of the PROMIS-A was conducted using the discrimination parameter a and the four threshold parameters b 1 to b 4 .The discrimination parameter a indicates the extent to which the item could differentiate persons with similar θ estimates, and the four threshold parameters b 1 to b 4 show the value of θ at which a person had equal probabilities of choosing a higher over a lower response. 11,14When the value of the a-parameter was lower than 0.50, the item was assumed to be non-discriminating.When the absolute value of the b-parameter was higher than 10, the item was considered inappropriate for the respondent. 14The four threshold parameters, b 1 to b 4 , should gradually be increased at intervals between 0.81 and 5 to show that the difficulty settings for adjacent response categories were reasonable. 29The OCCs were plotted to further illustrate the relationships between θ level and item response probability. 14The maximum information for each item was calculated, and the item information corresponding to a medium level of anxiety where the person's level of θ was 0 was provided.The TIC was depicted to demonstrate the test information that the PROMIS-A can provide for respondents with different anxiety levels. 14To assess whether an item had significant systematic errors due to group bias, both uniform DIF and non-uniform DIF were assessed for age (18-59 years and !60 years), gender (male and female), and education level (junior high school or below and high school or above).Ordinal logistic regression (OLR) methods were used to identify the presence of the DIF. 30To quantify the magnitude of DIF, the change in McFadden's pseudo R 2 higher than 0.02 was set as the critical value for rejecting the hypothesis of no DIF. 30

Ethical considerations
Ethical approval for this study was provided by the institutional review boards of Fudan University Shanghai Cancer Center (IRB No. 1810192-22) and Fudan University Affiliated Zhongshan Hospital (IRB No. 2020-076R).Trained researchers invited eligible patients to participate.The patients were informed about the aims and procedures of the study and the voluntary nature of participation.Patients signed informed consent forms and completed the investigation using paper-based questionnaires.While filling in the questionnaires, the researchers answered any questions raised by the participants if necessary and provided explanations for items without any inducement.After all questionnaires were completed, their integrity was checked.

Descriptive statistics
A total of 1100 questionnaires were sent out, and 1075 were completed.The effective recovery rate was 97.73%.The demographic and clinical characteristics of the study population are shown in Table 1.The mean age of the population was 53.62 years [standard deviation (SD) ¼ 13.48, range: 24-87 years].Most patients were female (62.0%) and had been diagnosed with gynecological cancer (43.2%).The average PROMIS-A score was 69.55 (SD ¼ 16.53, range: 36-108).The PROMIS-A also showed a high internal consistency reliability (Cronbach's α ¼ 0.98), which remained unchanged regardless of which item was omitted from the item bank.Convergent validity was acceptable, with correlation coefficients between items and the item bank greater than 0.40 (range: 0.75-0.89,P < 0.05).

Assumptions for the GRM
First, the statistic of the Kaiser-Meyer-Olkin test was 0.962, and the Pvalue of Bartlett's test of sphericity was lower than 0.001.The ratio of the first component characteristic root to the second component characteristic root was 13.11, indicating that the PROMIS-A conformed to the assumption of unidimensionality.Second, all item pairs were assumed to be locally independent, with residual correlations ranging from À0.40 to 0.62.Thirdly, the assumption of monotonicity was established with the graphs of item mean scores conditional on the rest scores in all items showing a gradually increasing trend.

Model fit and item parameter estimates
The model fit indices indicated a good fit of the GRM (RMSEA ¼ 0.05, TLI ¼ 0.97 and CLI ¼ 0.96).The results of the PROMIS-A item parameter estimates are shown in Table 2.The discrimination parameter a was higher than 0.50, ranging from a ¼ 2.28 to a ¼ 5.47.The threshold parameter b was also in an acceptable range, ranging from b 1 ¼ À2.94 to b 4 ¼ 3.21.In addition, all four item threshold parameters gradually increased.Except for the interval between b 1 and b 2 in the item EDANX09, which was slightly lower than 0.81, all the other intervals of the b-parameters in the items were between 0.81 and 5.

The OCCs of the PROMIS-A
The probabilities of respondents with different levels of θ choosing any category scored from 1 to 5 are shown in Fig. 1.Each line represents one response category on a five-point Likert-type response.The x-axis represents different levels of θ.The y-axis represents the probability of selecting the response category.For all the items of the PROMIS-A, the category curves of any single item were not entirely overlapped by other curves, indicating that each category in the items played a proper role in the measurement.

Item information
The maximum amount of information provided by each item ranged from 1.38 to 7.49 (Table 3).Ten out of the twenty-three items had maximum information higher than 3.0, and the level of θ ranged between À2.3 and 2.0.In addition, the item information when the person's level of θ was close to the corresponding maximum information indicated that the PROMIS-A was suitable for testing cancer patients with medium anxiety levels.

The TIC of the PROMIS-A
As shown in Fig. 2, the test information varied with the θ level.The curve indicated that when the level of θ was around 0, the test information reached its highest value of 63.427.In addition, the PROMIS-A was highly informative within a wide range of anxiety levels, with θ between À3.0 and 3.0.The test information rapidly decreased when θ exceeded the limit by À2.5 to 1.5.

DIF
Uniform and non-uniform DIF were present in PROMIS-A.Five items (EDANX27, EDANX01, EDANX33, EDANX46, and EDANX09) were identified as having a uniform DIF concerning age.Three items (EDANX05, EDANX51, and EDANX54) were identified for uniform DIF concerning gender, and one item (EDANX02) was identified for nonuniform DIF.Additionally, EDANX55 exhibited uniform DIF for the education domain, and EDANX27 exhibited non-uniform DIF.However, the combined impact was negligible for all DIF variables, with McFadden's pseudo R 2 lower than 0.02.

Discussion
Studies focusing on testing anxiety item banks are important for promoting the efficient application of patient-reported outcome measurements (PROMs). 12,31To the best of our knowledge, this is the first study to evaluate the psychometric properties of the Chinese version of the PROMIS-A using GRM in a sample of individuals with cancer.
The items from the PROMIS-A had a good threshold.All items in the PROMIS-A showed high discrimination, indicating that this scale can effectively distinguish and identify different levels of anxiety among patients with cancer.Compared to the Brazilian (discrimination range of 1.10 to 2.47), Dutch-Flemish (discrimination range of 1.34 to 3.59), and American (discrimination range of 1.26 to 3.86) versions of the PROMIS-Anxiety item bank, the Chinese version of the PROMIS-A showed a  higher discrimination range (2.28 to 5.47) when applied to a population of patients with cancer. 12,32This discrepancy might be attributed to the cultural difference and the focused application of PROMIS-A in patients with cancer.Significant differences in patient-reported health measured with PROMIS profile 29 were also observed in the UK, France, and Germany. 33The level of item discrimination was reflected in the corresponding OCCs.Lower discrimination led to flatter OCCs, while higher discrimination gave steeper OCCs. 14,34All items showed an appropriate threshold range, with no cases in which the threshold parameters were particularly high or low; however, the item EDANX09 (I had unpleasant thoughts that wouldn't leave my mind) had a b 1 and b 2 interval of 0.74, slightly lower than 0.81, indicating that the difference between "rarely" and "sometimes" in the response category for this item was small.Similar results were found in the Brazilian and American versions of the PROMIS-Anxiety item bank, with several items having b 1 and b 2 intervals lower than 0.81. 12,32Duan et al 35 claimed that it is difficult for individuals to make clear judgments about feelings, such as unpleasant thoughts; nevertheless, the threshold parameter intervals for other items in PROMIS-A were within an appropriate range.Considering the need to ease the measurement burden on patients, the wording of the response categories was kept consistent within the item bank. 16he value of information reflects the amount of information an item can provide when estimating a person's level of θ. 36 Here, the maximum information for all items in the PROMIS-A was calculated along with the corresponding information for items when a person's level of θ was 0. Ten out of the twenty-three items had maximum information higher than 3.0, and the level of θ ranged between À2.3 and 2.0; this indicated that these items captured general levels of anxiety and that the PROMIS-A was suitable for measuring patients with cancer exhibiting medium levels of anxiety.Moreover, item information is closely related to the discrimination and threshold parameters, and items with higher discrimination values generally provide more information about the underlying θ. 37 In this study, the item EDANX05 (I felt anxious) showed the highest score of 7.49, corresponding to the highest discrimination parameter of 5.47; the item EDANX47 (I felt indecisive) showed the lowest score of 1.38, corresponding to the lowest discrimination parameter of 2.28.The low-difficulty item was expected to perform poorly in differentiating between respondents at the high end of θ because they could easily meet the condition and scored high on the item, while the higher-difficulty item was expected to perform poorly in differentiating between respondents at the low end of θ because it was difficult for them to meet the condition.The spread of item information and the location on the θ scale at which the item information reaches its maximum can be determined by the category threshold parameters. 37Previous studies that used IRT to conduct personalized analyses of scale items recommended that items with more information be selected to create shorter measurement tools  that may perform nearly as well as the original longer tools. 37,38The TIC depicted in this study further represents the correlation between the level of test information and the measurement precision, as greater amounts of test information led to smaller standard errors in measurement.Consistent with other language versions, the Chinese version of the PROMIS-A applied to patients with cancer also provided high amounts of test information across a wide range of anxiety levels. 9,39est fairness is critical for the validity of group comparisons involving different characteristics, and DIF analysis can help ensure test fairness.In this study, both uniform and non-uniform DIF for age, gender, and education level were detected in the Chinese version of the PROMIS-A among patients with cancer.Consistent with the study by de Castro et al, 32 statistically significant DIF for age and gender have previously been detected in the Brazilian version of the PROMIS-Anxiety item bank.The age-related DIF of the PROMIS-Anxiety item bank for American people was also detected. 30Although DIF items have appeared in different versions of the PROMIS-Anxiety item bank, the magnitude of the DIF was low in most cases.Furthermore, Flens et al 12 reported that the Dutch-Flemish version of the PROMIS-Anxiety item bank presented no DIF items for age, gender, and education level, and all the items were enrolled in the subsequent CAT simulation.
Four schemes have been proposed for items showing DIF: deleting, ignoring, multiple-group modeling, and modeling DIF as a secondary dimension. 40However, the method of handling items exhibited in DIF remains unclear.Liu et al 41 compared these four methods and suggested that different treatments should be used for different assessment purposes.Ignoring DIF items is the priority if the magnitude of DIF is small, whereas item parameter calibration is of the greatest interest for a short test. 40,41dditionally, the detection of DIF may be influenced by several factors, such as sparse data and small sample sizes. 42The combined impact of difference would always be negligible when weighted by a small number of the focal group trait distribution. 30Because of this, the treatment of DIF should be approached with caution for practical purposes. 43his study has several limitations.Firstly, the samples were collected using convenience sampling at two tertiary hospitals in Shanghai.Failure to recruit patients with cancer from other areas of China may have impaired the representativeness of the sample population.The PROMIS-A was also scaled such that the US general population had a mean score of 50 with an SD of 10. 16 However, this study did not investigate cross-cultural validity and the available reference scores.DIF detection for language and linear transformation for calibrated scores could be adopted for a standard measurement of anxiety and horizontal comparisons between countries in the future.In addition, measuring anxiety in the real world is relatively complex, and multiple factors might have influenced the measurement process.Testing the item bank in combination with other advanced models is necessary to ensure accurate assessment.

Conclusions
This psychometric investigation supported using the Chinese version of the PROMIS-A to measure moderate anxiety levels in patients with cancer.Using the GRM, sufficient psychometric information was obtained on individual items in the PROMIS-A, which enabled the PROMIS-A metric to be used while maintaining comparability with previous studies.The Chinese version of the PROMIS-A was found to be a suitable question source for the future development of a CAT system for patients with cancer.

Table 1
Demographic and clinical characteristics of the study population (N ¼ 1075).

Table 2
Item parameter estimates of the PROMIS-A.

Table 3
Item information of the PROMIS-A.